Random Notes:

Performance Analysis and Scaling

Strong scaling vs weak scaling Weak scaling: $S=T_1\times n/T_N$

Weak scaling is generally easier to achieve. Communication overhead is typically constant for a weak scaling experiment.

Strong scaling can be harder because the size and shape of your problem w.r.t each processor is changing. This can affect e.g. cache fitting, vectorisation, communication, etc.

We can capture how good the scaling is as efficiency, which is a ratio of speedup to processor count. E.g. 100% efficiency means that doubling processor count doubles speed.

Generally plotting speedup is better than plotting runtime as you can compare different problem sizes. But should still include key (not all) values in a table.

The operational intensity ( $OI$ ) is the number of FLOPs divided by the number of bytes accessed.

The Roofline model is calculated as $R(OI) = min(\text{peak flops}, OI \times \text{peak memory bandwidth})$

You can graph this with FLOPs/s on the y-axis, and OI on the x-axis.